---
title: "Supplementary appendix"
author:
  - name: "Claudius Gräbner-Radkowitsch"
    affiliation: "Department of Pluralist Economics, Europa Universität Flensburg, Germany; Institute for the Comprehensive Analysis of the Economy, Johannes Kepler University Linz, Austria"
    email: "claudius.graebner-radkowitsch@uni-flensburg.de"
  - name: "Birte Strunk"
    affiliation: "Department of Economics, Bard College, United States"
    email: "bstrunk@bard.edu"
date: today
format:
  docx:
    toc: true
    number-sections: true
    reference-doc: TEMPLATE.docx
    highlight-style: github
    fig-cap-location: bottom
    df-print: kable
    tbl-colwidths: auto
    tbl-cap-location: top
    fig-format: png
    fig-dpi: 300
    fig-width: 6
    fig-height: 4
    echo: false
    warning: false
    message: false
bibliography: GlobalDependencies.bib
csl: chicago-author-date.csl
citation-style: author-date
suppress-bibliography: false
link-citations: true
colorlinks: true
fontfamily: times
mainfont: "Times New Roman"
sansfont: "Arial"
monofont: "Courier New"
---

``` {python}
import json
import pandas as pd 
import wbdata
import mario
import matplotlib.pyplot as plt 
import numpy as np
import pymrio
import subprocess
import os
from pathlib import Path

def here(*args):
    """Python equivalent of R's here() function"""
    # Find project root by looking for common project markers
    current = Path.cwd()
    while current != current.parent:
        markers = [".git", "requirements.txt", ".python-version", "pyproject.toml"]
        if any((current / marker).exists() for marker in markers):
            project_root = current
            break
        current = current.parent
    else:
        project_root = Path.cwd()
    
    # Join the provided path components
    return project_root.joinpath(*args)

def format_millions(value):
    """Format large numbers in millions with M suffix"""
    return f"{value/1_000_000:.1f}M"

def format_thousands(value):
    """Format numbers in thousands with K suffix"""
    return f"{value/1_000:.0f}K"

def format_currency(value):
    """Format currency values in millions"""
    return f"${value:.1f}"

def format_currency_millions(value):
    """Format currency values in millions"""
    return f"${value/1_000_000:.1f}M"

def format_with_commas(value):
    """Format numbers with comma separators"""
    return f"{value:,.0f}"

def format_number(value):
    """Format numbers"""
    return f"{value:,.0f}"

def format_percentage(value, decimals=1):
    """Format as percentage with specified decimal places"""
    return f"{value:.{decimals}f}%"

def smart_format(value, threshold_millions=1_000_000, threshold_thousands=1_000):
    """Automatically choose the best format based on value size"""
    if abs(value) >= threshold_millions:
        return format_millions(value)
    elif abs(value) >= threshold_thousands:
        return format_thousands(value)
    else:
        return f"{value:.1f}"
```

# Appendix A: Country Classifications

``` {python}
#| label: tbl-country-classifications
#| tbl-cap: "Country Classifications: Global North and Global South"
#| echo: false

# Load country names mapping
with open(here("data", "tidy", "country_names.json"), 'r') as f:
    country_names_dict = json.load(f)

with open(here("data", "tidy", "north_codes.json")) as f:
    north_codes = set(json.load(f))

with open(here("data", "tidy", "south_codes.json")) as f:
    south_codes = set(json.load(f))

# Convert sets to sorted lists for consistent ordering
north_countries_sorted = sorted(list(north_codes))
south_countries_sorted = sorted(list(south_codes))

# Get full country names, ensuring we use the mapping for both groups
north_full_names = [country_names_dict.get(code, code) for code in north_countries_sorted]
south_full_names = [country_names_dict.get(code, code) for code in south_countries_sorted]

# Format country lists as comma-separated strings
north_countries_formatted = ", ".join(north_full_names)
south_countries_formatted = ", ".join(south_full_names)

# Create classification table with country lists
classification_table = pd.DataFrame({
    'Global North': [north_countries_formatted],
    'Global South': [south_countries_formatted]
})

# Set pandas display options to show full content without truncation
pd.set_option('display.max_colwidth', None)
pd.set_option('display.width', None)
pd.set_option('display.max_columns', None)

classification_table
```

```{python}
#| label: tbl-group-descriptives
#| tbl-cap: "Descriptive Statistics by Country Group (2017)"
#| echo: false

# Load the emissions data
stressor_data = pd.read_csv(here("data", "tidy", "emissions_table.csv"))

# Get population data for each group
population_data_path = here("data", "tidy", "population_data.csv")
population_data = pd.read_csv(population_data_path)
north_pop_data = population_data[population_data['country'].isin(north_codes)]
south_pop_data = population_data[population_data['country'].isin(south_codes)]

population_overview_path = here("data", "tidy", "population_overview.csv")
population_overview = pd.read_csv(population_overview_path)

pop_north = population_overview.loc[population_overview["region"] == "GlobalNorth", "Population"]
pop_south = population_overview.loc[population_overview["region"] == "GlobalSouth", "Population"]
pop_china_india = population_overview.loc[population_overview["region"] == "ChinaIndia", "Population"]
        
population_north_final = int(pop_north.iloc[0]) if len(pop_north) > 0 else 0
population_south_final = int(pop_south.iloc[0]) if len(pop_south) > 0 else 0
population_china_india_final = int(pop_china_india.iloc[0]) if len(pop_china_india) > 0 else 0  

# Extract values from stressor_data
north_emissions = stressor_data.loc[stressor_data['Region'] == 'Global North', 'Emissions (Mt)'].iloc[0]
south_emissions = stressor_data.loc[stressor_data['Region'] == 'Global South', 'Emissions (Mt)'].iloc[0]
north_materials = stressor_data.loc[stressor_data['Region'] == 'Global North', 'Materials (M)'].iloc[0]
south_materials = stressor_data.loc[stressor_data['Region'] == 'Global South', 'Materials (M)'].iloc[0]
north_per_cap_emissions = stressor_data.loc[stressor_data['Region'] == 'Global North', 'Per Cap Em.'].iloc[0]
south_per_cap_emissions = stressor_data.loc[stressor_data['Region'] == 'Global South', 'Per Cap Em.'].iloc[0]
north_per_cap_materials = stressor_data.loc[stressor_data['Region'] == 'Global North', 'Per Cap Mat.'].iloc[0]
south_per_cap_materials = stressor_data.loc[stressor_data['Region'] == 'Global South', 'Per Cap Mat.'].iloc[0]

# Calculate basic descriptives
descriptive_stats = {
   'Indicator': [
       'Number of countries',
       'Total population (millions)',
       'Average population per country (millions)',
       'Median population per country (millions)',
       'Share of global population (%)',
       'Total GHG emissions (Mt CO₂-eq)',
       'Per capita emissions (t CO₂-eq/person)',
       'Total raw material inputs (millions)',
       'Per capita material inputs'
   ],
   'Global North': [
       len(north_codes),
       f"{population_north_final / 1_000_000:.1f}",
       f"{north_pop_data['population'].mean() / 1_000_000:.1f}",
       f"{north_pop_data['population'].median() / 1_000_000:.1f}",
       f"{(population_north_final / (population_north_final + population_south_final)) * 100:.1f}",
       f"{north_emissions:.1f}",
       f"{north_per_cap_emissions:.3f}",
       f"{north_materials:.1f}",
       f"{north_per_cap_materials:.2f}"
   ],
   'Global South': [
       len(south_codes),
       f"{population_south_final / 1_000_000:.1f}",
       f"{south_pop_data['population'].mean() / 1_000_000:.1f}",
       f"{south_pop_data['population'].median() / 1_000_000:.1f}",
       f"{(population_south_final / (population_north_final + population_south_final)) * 100:.1f}",
       f"{south_emissions:.1f}",
       f"{south_per_cap_emissions:.3f}",
       f"{south_materials:.1f}",
       f"{south_per_cap_materials:.2f}"
   ]
}

descriptive_table = pd.DataFrame(descriptive_stats)
descriptive_table
```

The country classification we use for our analysis is summarized in @tbl-country-classifications.
It largely follows @dorninger2021, who categorize countries based on World Bank income classifications (according to a country' GNI per capita) but correct for population sizes. The Global North consists of high-income countries, while the Global South encompasses low-income, lower-middle-income, and upper-middle-income countries. This classification excludes China and India from both groups due to their unique economic characteristics and substantial influence. Due to their sheer size they would dominate the results for their respective group completely.

@tbl-group-descriptives provides key demographic and environmental indicators for the two groups. The Global North comprises 
`{python} len(north_codes)` 
countries with a combined population of 
`{python} format_millions(population_north_final)` 
people, representing 
`{python} format_percentage((population_north_final / (population_north_final + population_south_final)) * 100, 1)` 
of the global population covered in our analysis. The Global South includes `{python} len(south_codes)` countries with `{python} format_millions(population_south_final) `
inhabitants, accounting for 
`{python} format_percentage((population_south_final / (population_north_final + population_south_final)) * 100, 1)` 
of the world population.

When looking at environmental indicators the stark asymmetries between North and South become obvious: while both regions produce similar total greenhouse gas emissions , the Global North's per capita emissions are``{python} format_number(north_per_cap_emissions/south_per_cap_emissions)` times higher than those of the Global South. For raw materials, the Global South uses significantly more in absolute terms, but the Global North consumes 
`{python} format_number(north_per_cap_materials/south_per_cap_materials)` times more per capita. These patterns reflect the substantial inequalities in resource consumption and environmental impact between developed and developing economies that has motivated our analysis of structural dependencies in the first place.

# Appendix B: Details on methodological approach

This Section is meant to provide additional details on the formal methodology to compute dependency shares in the main paper. For a more general introduction to input-output modelling see Miller and Blair [@MillerIO].
The code used for our analysis is publicly available as Gräbner-Radkowitsch and Strunk [@dependencecode].

Input-output (I-O) analysis goes back to the contributions of Wassily Leontief in the 1930s. His key motivation was to develop a formal framework that is closely aligned with how economic data gets computed by administrative organisations (i.e. following closely the systems of national accounting), and that allows for an analysis of the interdependencies between different sectors of an economy. Correspondingly, the method of I-O analysis represents the economy as a network of interconnected sectors (or 'industries'), where each sector uses inputs from other sectors to produce its output. Thus, it is particularly useful for understanding how economic shocks or policy changes in one sector propagate through the entire economic system.

The fact that the production of any good or service requires inputs from various other sectors, creating a web of dependencies that can be quantified through mathematical relationships, is the vantage point of I-O analysis. In its classical form, I-O analysis was concerned with a single national economy. But when extended to multiple regions or countries, I-O analysis becomes a powerful tool for examining international trade dependencies and the potential impacts of policy changes across national boundaries.

## Mathematical Formulation

### Basic Input-Output Table Structure

An input-output table is organized as a matrix that captures all monetary transactions within an economy over a specific period (typically one year).^[There are also *physical* I-O tables that consider flows of physical material. Moreover, monetary I-O tables are sometimes augmented with physical tables that track, for instance, the environmental impact of production and consumption activities. EORA is one such 'environmentally augmented' I-O table.] The table consists of three main components:

1. Intermediate transactions matrix ($\mathbf{Z}$): This is an $n\times n$ square matrix that records the monetary flows between the $n$ sectors in the economy
2. Final demand matrix ($\mathbf{Y}$): This is an $n\times m$ matrix that captures sales to end users (households, government, exports, with   $m$ denoting the number of final demand categories); these are flows that 'leave' the circular economy represented by $\mathbf{Z}$
3. Value-added matrix ($\mathbf{V}$): This is a $k\times n$ matrix that includes wages, profits, and taxes, with $k$ denoting the number of primary input categories; these are flows that 'enter' the circular economy represented by $\mathbf{Z}$ as primary inputs

The *final demand* represents the portion of each sector's output that is consumed by end users rather than used as intermediate inputs by other sectors. It usually comprises household consumption, government spending, investment (capital formation), and net exports. Final demand is crucial because it represents the ultimate destination of production in the economy - to satisfy the consumption needs of households, government requirements, capital accumulation, and export markets.

*Value added* or *primary inputs* constitute the complement to final demand on the input side of the economy. While final demand shows where outputs go, value added represents the primary inputs that enter the production process. Value added includes compensation of employees (wages and salaries), gross operating surplus (profits), taxes on production, and depreciation of capital. These primary inputs, along with intermediate inputs from other sectors, enable each industry to produce its total output. The sum of value added across all sectors equals the gross domestic product of the economy.

The fundamental accounting identity underlying I-O tables ensures that total inputs equal total outputs for each sector:

$$x_i = \sum_{j=1}^{n} z_{ij} + y_i$$

where:

- $x_i$ = total output of sector $i$
- $z_{ij}$ = intermediate sales from sector $i$ to sector $j$  
- $y_i$ = final demand for sector $i$'s output
- $n$ = number of sectors

Following the convention that lower-case bold letters represent vectors and upper-case bold letters represent matrices [@MillerIO, 12], the previous equation can be written for the entire economy as 

$$\mathbf{x} = \mathbf{Z}\mathbf{i} + \mathbf{y}$$

where 

$$\mathbf{x} = \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix}, \quad \mathbf{Z} = \begin{bmatrix} z_{11} & \cdots & z_{1n} \\ \vdots & \ddots & \vdots \\ z_{n1} & \cdots & z_{nn} \end{bmatrix} \quad \text{and} \quad \mathbf{y} = \begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix}$$

$\mathbf{i}$ is a vector of ones that is needed for the above computations to satisfy the rules of linear algebra.

### Technical Coefficients Matrix

The core of I-O analysis lies in the technical coefficients matrix $\mathbf{A}$, which captures the direct input requirements per unit of output. Each element $a_{ij}$ represents the amount of input from sector $i$ required to produce one unit of output in sector $j$:

$$a_{ij} = \frac{z_{ij}}{x_j}$$

This gives the technical coefficients matrix:

$$\mathbf{A} = \mathbf{Z} \cdot \hat{\mathbf{x}}^{-1}$$

where $\hat{\mathbf{x}}$ is a diagonal matrix of total outputs:

$$\hat{\mathbf{x}} = \begin{bmatrix} x_1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & x_n \end{bmatrix}, \quad \hat{\mathbf{x}}^{-1} = \begin{bmatrix} 1/x_1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 1/x_n \end{bmatrix}$$

It is a crucial and potentially problematic assumption of I-O analysis that $\mathbf{A}$ is fixed, meaning that economies of scale or technological innovations cannot be accounted for.

### The Fundamental Input-Output Equation and the Leontief Inverse

The relationship between total output, intermediate demand, and final demand can be expressed as:

$$\mathbf{x} = \mathbf{A}\mathbf{x} + \mathbf{y}$$

where $\mathbf{y}$ is the vector of final demand for each sector's output.

Rearranging to solve for total output:

$$\mathbf{x} - \mathbf{A}\mathbf{x} = \mathbf{y}$$

$$(\mathbf{I} - \mathbf{A})\mathbf{x} = \mathbf{y}$$

$$\mathbf{x} = (\mathbf{I} - \mathbf{A})^{-1}\mathbf{y}$$

The matrix $(\mathbf{I} - \mathbf{A})^{-1}$ is known as the *Leontief inverse* or *total requirements matrix*, denoted as $\mathbf{L}$:

$$\mathbf{L} = (\mathbf{I} - \mathbf{A})^{-1}$$

Each element $l_{ij}$ of this matrix represents the total (direct plus indirect) amount of output from sector $i$ required to satisfy one unit of final demand for sector $j$'s output. The diagonal elements represent direct requirements, while off-diagonal elements capture indirect requirements through the production network.

## Multi-Regional Input-Output Extension

When analyzing trade dependencies between regions, the basic I-O framework can be extended to multiple regions. The biggest challenge in this context is to harmonize the data obtained for different countries, and take care of the many accounting problems that occur in practice. These challenging problems, however, are not visible when we look at the equations as such, which is why the mathematical formulation might appear trivial at first, as it looks like one is just "stacking" the I-O tables for different regions. 

For a two-region system (which we might interpret as Global North and Global South), the technical coefficients matrix becomes:

$$\mathbf{A} = \begin{pmatrix}
\mathbf{A}^{NN} & \mathbf{A}^{NS} \\
\mathbf{A}^{SN} & \mathbf{A}^{SS}
\end{pmatrix}$$

where:

- $\mathbf{A}^{NN}$ = technical coefficients for North-North transactions
- $\mathbf{A}^{NS}$ = technical coefficients for North-South transactions
- $\mathbf{A}^{SN}$ = technical coefficients for South-North transactions  
- $\mathbf{A}^{SS}$ = technical coefficients for South-South transactions

Each of these matrices contains technical coefficients for a particular subset of transactions. For instance, assume that the North consists of three sectors (e.g., agriculture, manufacturing, and services), the matrix $\mathbf{A}^{NN}$ might look like:

$$\mathbf{A}^{NN} = \begin{pmatrix}
0.15 & 0.25 & 0.05 \\
0.20 & 0.35 & 0.10 \\
0.10 & 0.15 & 0.30
\end{pmatrix}$$

where each element $a_{ij}^{NN}$ represents the amount of input from Northern sector $i$ required per unit of output in Northern sector $j$. For example, $a_{12}^{NN} = 0.25$ indicates that manufacturing in the North requires 0.25 units of agricultural inputs per unit of manufacturing output produced within the North.

The corresponding Leontief inverse for the two-region system is:

$$\mathbf{L} = \begin{pmatrix}
\mathbf{L}^{NN} & \mathbf{L}^{NS} \\
\mathbf{L}^{SN} & \mathbf{L}^{SS}
\end{pmatrix} = \begin{pmatrix}
\mathbf{I} - \mathbf{A}^{NN} & -\mathbf{A}^{NS} \\
-\mathbf{A}^{SN} & \mathbf{I} - \mathbf{A}^{SS}
\end{pmatrix}^{-1}$$

This matrix captures how final demand changes in one region affect production in all regions through both direct trade links and indirect effects through global supply chains.

## Dependency Share Calculation

In the main paper, we have defined **dependency shares** as the proportion of value added (income and profits) in the Global South that depends on trade relationships with the Global North. This captures the economic vulnerability of Southern economies to potential demand reductions in Northern economies. In our analysis, we distinguish between *final demand dependency* and *total trade dependency*. Both concepts are defined formally below. To this end, we use the following notation:

- $S$ = set of Global South countries
- $N$ = set of Global North countries
- $\mathbf{v}_S$ = vector of value-added coefficients for Global South sectors
- $\mathbf{L}_{SS}$ = Global South domestic Leontief inverse matrix  
- $\mathbf{y}_{S \leftarrow N}$ = vector of Global North final demand for Global South products
- $\mathbf{z}_{S \leftarrow N}$ = vector of Global North intermediate demand for Global South products
- $V_S^{total}$ = total value added of the specified type in the Global South

A key element of our approach is that we focus on the domestic production chains within the Global South that are triggered by external (Global North) demand changes. This is why we use the Global South domestic Leontief inverse $\mathbf{L}_{SS}$ rather than the full global Leontief matrix. 

### Final Demand Dependency

The *final demand dependency* quantifies how much value added in the Global South depends on exports of goods and services to final consumers in the Global North (such as households and governments):

$$DS^{final} = \frac{\mathbf{v}_S \cdot \mathbf{L}_{SS} \cdot \mathbf{y}_{S \leftarrow N}}{V_S^{total}} \times 100$$

This calculation proceeds in three steps:

1. **External demand identification**: $\mathbf{y}_{S \leftarrow N}$ captures all final demand from Global North entities for Global South products and services

2. **Domestic production requirements**: $\mathbf{L}_{SS} \cdot \mathbf{y}_{S \leftarrow N}$ calculates the total production (direct plus indirect) required within the Global South to satisfy this external final demand, accounting for domestic supply chain linkages

3. **Value added generation**: $\mathbf{v}_S \cdot (\mathbf{L}_{SS} \cdot \mathbf{y}_{S \leftarrow N})$ applies the value-added coefficients to determine how much of the specified primary input (employee compensation, mixed income, or operating surplus) is generated through these production processes

### Total Trade Dependency

The *total trade dependency* includes both final demand and intermediate demand from the Global North:

$$DS^{total} = \frac{\mathbf{v}_S \cdot \mathbf{L}_{SS} \cdot (\mathbf{y}_{S \leftarrow N} + \mathbf{z}_{S \leftarrow N})}{V_S^{total}} \times 100$$

where $\mathbf{z}_{S \leftarrow N}$ represents Global North purchases of Global South products for use as intermediate inputs in Northern production processes.

This can be decomposed into two components:

$$DS^{total} = DS^{final} + DS^{intermediate}$$

where:

$$DS^{intermediate} = \frac{\mathbf{v}_S \cdot \mathbf{L}_{SS} \cdot \mathbf{z}_{S \leftarrow N}}{V_S^{total}} \times 100$$

### Implementation Details

The computational implementation of these equations can be seen in the code that we publish alongside this paper as Gräbner-Radkowitsch and Strunk [-@dependencecode]. When you align the code to the equations above it might be useful to keep the following aspects in mind:

- Sectors are organized by country-sector pairs (e.g., "Brazil-Agriculture", "Brazil-Manufacturing")
- The Global South domestic Leontief inverse $\mathbf{L}_{SS}$ includes all within-South linkages while excluding North-South production dependencies
- Value-added coefficients $\mathbf{v}_S$ are calculated as ratios of primary inputs to total sectoral output
- Final and intermediate demands are extracted from the appropriate sub-matrices of the full global input-output table

These specificities are due to the way the data is organized in the EORA tables we use.

## Limitations and Interpretative Caveats

It is important to keep in mind that the static nature of I-O analysis imposes several important limitations on interpretation. The analysis assumes fixed technical coefficients, meaning that the input requirements per unit of output remain constant regardless of changes in relative prices or technological conditions. Additionally, the analysis assumes zero price elasticity of final demand in the short run, implying that quantity demanded does not respond immediately to price changes. These assumptions make I-O analysis most suitable for examining short-term impacts and immediate structural dependencies, while longer-term dynamic adjustments requiring behavioral responses fall outside the scope of this methodology. 

For the present context this means that the computed dependency shares should be considered a rough indication for the short-term vulnerability of a country, rather than a precise estimate. It is likely that some countries are more capable of adapting their productive structure to the new demand conditions, meaning the actual medium-term impact on their value added might be much smaller than suggested by the dependency shares. At the same time, other countries might face particular challenges when transforming their economies. Such differences are crucial to understand, but I-O analysis is not the right tool to study them. 

# Appendix C: Regression Details and Diagnostics {#sec-diagnostics}

```{python}
#| echo: false
#| output: false
# Execute the R script to conduct robustness checks

try:
    subprocess.run(
        ["Rscript", "R/RobustEstimation.R"], 
        capture_output=True, 
        check=True,
        cwd=str(here())  # Use your here() function!
    )
    # print("✅ Robust estimation conducted")
except subprocess.CalledProcessError as e:
    print(f"❌ Error: {e}")
    raise
```

```{python}
#| echo: false

# Load combined regression results
with open(here('output/combined_regression_results.json'), 'r') as f:
    combined_results = json.load(f)

# Extract key statistics for inline use
outlier_stats = combined_results['outlier_analysis']
comparison_stats = combined_results['comparison']

# Key numbers for text
n_outliers = len(outlier_stats['outlier_statistics'])
max_std_residual = outlier_stats['max_standardized_residual']
max_percent_diff = outlier_stats['max_percent_difference']
intercept_percent_diff = outlier_stats['intercept_percent_diff']
trade_percent_diff = outlier_stats['trade_percent_diff']

# Extract specific country information
outlier_countries = outlier_stats['outlier_statistics']
ethiopia_residual = [x['standardized_residual'] for x in outlier_countries if x['country'] == 'ETH'][0]
ethiopia_weight = [x['mm_weight'] for x in outlier_countries if x['country'] == 'ETH'][0]

# Trade coefficient values
trade_ols = outlier_stats['trade_coef_ols']
trade_mm = outlier_stats['trade_coef_mm']
trade_exclusion = outlier_stats['trade_coef_exclusion']

# For the regression table
ols_coeffs = pd.DataFrame(combined_results['ols']['coefficients'])
mm_coeffs = pd.DataFrame(combined_results['mm']['coefficients'])
exclusion_coeffs = pd.DataFrame(combined_results['exclusion']['coefficients'])
```

In our regressions we use the following control variables:

| Variable | Definition | Source | Unit |
|----------|------------|---------|------|
| Trade dependency | Average dependency across three primary inputs | Authors' calculations from EORA26 | Percentage |
| Trade openness | (Exports + Imports) / GDP | World Bank WDI | Percentage |
| GDP per capita | GDP per capita, PPP (constant 2017 USD) | World Bank WDI | USD |
| Population | Total population | World Bank WDI | Persons |
| Natural resources | Total natural resources rents (% of GDP) | World Bank WDI | Percentage |

## Details on the OLS Baseline Regression

![OLS Regression Diagnostics](../output/diagnostics_no_eci.png){#fig-ols-diagnostics width=100%}

@fig-ols-diagnostics presents the standard diagnostic plots for the baseline OLS regression. The *residuals vs fitted* plot (top left), the most common diagnostic plot when it comes to linear regression, shows residuals -- the difference between observed and fitted values -- on the y-axis and the fitted values (or 'model predictions') on the x-axis. In the best case, these plot does not exhibit any obvious structure. In the present case, this plot shows the point being scattered roughly evenly around zero with no clear pattern,yet it also points to several observations with high residuals, particularly Ethiopia, Libya, and Papua New Guinea.
While this indicates that the linear model assumptions are reasonably met, it also suggests caution with regard to the influence of these outliers for the overall result.

The *normal Q-Q* plot (top right) compares the distribution of the residuals or our model (y-axis) against what we would expect from a normal distribution (x-axis). In the ideal case of perfectly normally distributed residuals, the points would appear on the 45 degree line. 
In our case, the plot suggests that in most areas the residuals follow approximately a normal distribution, as most points lie close to the diagonal line. Yet, it also reveals deviations from normality in the tails, suggesting, again, the presence of outliers. 

The *scale-location* plot (bottom left) displays the square root of standardized residuals against fitted values. This plot is used to assess whether the variance of residuals is constant across all prediction levels. For our case, the plot shows relatively constant variance across all values, suggesting the homoscedasticity (equal variance) assumption is met. 

Finally, the *residuals vs leverage* plot (bottom right) helps identify influential observations by plotting standardized residuals against leverage, which is a measure of how far a country's characteristics deviate from the average. In the present case, the plot identifies several influential observations including Ethiopia, Libya, and Papua New Guinea. While none of them exceeds conventional thresholds for problematic leverage (Cook's distance > $4/n$), their presence suggests to also use more robust estimation methods to solidify our results.

## Robust MM Estimation

![MM-Estimator Diagnostics](../output/mm_estimator_diagnostics.png){#fig-mm-diagnostics width=100%}

To this end, we use an MM-estimator, which employs a two-step process. First, it computes an initial robust estimate using an S-estimator. The S-estimator finds the parameter values that minimize a robust measure of scale, rather than the sum of squared residuals used in ordinary least squares (OLS). Then, it improves efficiency through M-estimation by re-estimating the parameters and minimizing a robust loss function that down-weights large residuals. The "M" stands for "maximum likelihood-type." A breakdown point is a theoretical measure indicating how many outliers an estimator can tolerate before failing completely. It represents the worst-case scenario in which outliers are positioned to cause maximum disruption to the estimation process. A breakdown point of 50% means the method can handle up to half of the observations being outliers while still producing meaningful results—theoretically the maximum for any estimator. In practice, the MM-estimator's high breakdown point allows it to identify core relationships even when a substantial minority of countries exhibit highly unusual patterns. This approach offers automatic outlier detection and handling, which differs from ordinary least squares (OLS) regression in that it is less sensitive to extreme values. While OLS treats all observations equally and is substantially influenced by outliers (essentially having a breakdown point of zero), the MM-estimator assigns weights to each observation based on how well it conforms to the overall pattern. Countries with unusual dependency patterns receive reduced weights, which can prevent them from influencing the observed relationships across the majority of countries disproportionately. This is particularly useful in cross-country analyses, where some nations may have unique economic structures that do not represent broader global patterns. Thus, the MM-estimator aims to preserve genuine relationships while maintaining statistical reliability without requiring manual exclusion of specific countries. 

@fig-mm-diagnostics shows the diagnostic plots for the MM-estimator, which automatically downweights influential, but uncommon observations. It is used if one suspects outliers to have a problematic influence on the results. The @fig-mm-diagnostics suggests that this robust method indeed performs better than standard OLS: The residuals vs fitted plot shows even better scatter around zero, the Q-Q plot indicates more normal residual distribution with only three observations that act as outliers, which are also clearly visible in the weights plot (bottom right): Ethiopia (given a weight of zero), Libya (with a weight of about 0.18), and Fiji (with a weight of about 0.93). 

## OLS regression with manual outlier removal

The third estimation we did was not to use the weighting approach for handling outliers characteristic for the MM-estimation, but to manually remove the outliers that were detected by the MM-estimator (Ethiopia, Libya, and Fiji) and then to rely on OLS.

![MM-Estimator Diagnostics](../output/exclusion_model_diagnostics.png){#fig-excl-ols-diagnostics width=100%}

The corresponding diagnostics are shown in @fig-excl-ols-diagnostics, which presents the diagnostic plots for the OLS model after explicitly excluding the outliers. The residuals vs fitted plot (top left) shows substantially improved behavior compared to the original OLS model, with residuals more evenly scattered around zero and no apparent systematic patterns. The normal Q-Q plot (top right) demonstrates that the residuals follow approximately normal distribution, with most points lying close to the diagonal line and fewer extreme deviations than in the full sample. The scale-location plot (bottom left) indicates more stable variance across fitted values after outlier removal, suggesting that the homoscedasticity assumption is (even) better met. The residuals vs leverage plot (bottom right) still identifies some influential observations (Papua New Guinea, Seychelles, Azerbaijan) but none exceed conventional problematic leverage thresholds (Cook's distance > 4/n). Notably, the removal of the three extreme outliers has substantially improved all model diagnostics while preserving the core relationships.

## Model comparison

The comparison across estimation methods suggests that our findings are very consistent, regardless of how outliers are treated (@tbl-three-method-comparison). The trade openness coefficient—our central result—remains highly stable across all approaches (
`{python} f"{combined_results['ols']['coefficients'][1]['estimate']:.3f}"`, 
`{python} f"{combined_results['mm']['coefficients'][1]['estimate']:.3f}"`, and 
`{python} f"{combined_results['exclusion']['coefficients'][1]['estimate']:.3f}"`
respectively), confirming that countries with higher trade intensity tend to be more vulnerable to Global North demand reductions. The GDP per capita and natural resource coefficients show similar patterns across methods, while the population coefficient exhibits some variation but preserves the same directional relationship. 

Both robust approaches produce smaller standard errors than OLS and achieve higher R-squared values (
`{python} f"{combined_results['mm']['r_squared']:.3f}"` 
and 
`{python} f"{combined_results['exclusion']['r_squared']:.3f}"` 
versus 
`{python} f"{combined_results['ols']['r_squared']:.3f}"`
), indicating improved model fit when the three identified outliers (Ethiopia, Libya, and Fiji) are dealt with. 

```{python}
#| label: tbl-three-method-comparison
#| tbl-cap: "Comprehensive Coefficient Comparison Across Estimation Methods. MM-estimator and explicit exclusion both provide robust estimates with improved precision compared to OLS."
#| echo: false

# Extract coefficient data for three-way comparison
comparison_data = []

for i, ols_coef in enumerate(combined_results['ols']['coefficients']):
    mm_coef = combined_results['mm']['coefficients'][i]
    exclusion_coef = combined_results['exclusion']['coefficients'][i]
    
    var_name = ols_coef['variable']
    
    # Clean variable names
    var_display = {
        '(Intercept)': 'Constant',
        'trade_pct_gdp': 'Trade (% of GDP)',
        'log(gdp_per_capita_ppp)': 'Log GDP per capita', 
        'log(population)': 'Log Population',
        'total_natural_resources_rents_pct_gdp': 'Natural resources (% of GDP)'
    }.get(var_name, var_name)
    
    comparison_data.append([
        var_display,
        f"{ols_coef['estimate']:.3f}{ols_coef['stars']}",
        f"({ols_coef['std_error']:.3f})",
        f"{mm_coef['estimate']:.3f}{mm_coef['stars']}",
        f"({mm_coef['std_error']:.3f})",
        f"{exclusion_coef['estimate']:.3f}{exclusion_coef['stars']}",
        f"({exclusion_coef['std_error']:.3f})"
    ])

# Add model statistics
comparison_data.extend([
    ['', '', '', '', '', '', ''],
    ['R-squared', f"{combined_results['ols']['r_squared']:.3f}", '', 
     f"{combined_results['mm']['r_squared']:.3f}", '', 
     f"{combined_results['exclusion']['r_squared']:.3f}", ''],
    ['Observations', f"{combined_results['ols']['n_obs']}", '', 
     f"{combined_results['mm']['n_obs']}", '', 
     f"{combined_results['exclusion']['n_obs']}", ''],
    ['Outlier treatment', 'Included', '', 'Downweighted', '', 'Excluded', ''],
    ['Countries affected', '-', '', 
     ', '.join(combined_results['mm']['outliers_identified']), '', 
     ', '.join(combined_results['exclusion']['excluded_countries']), '']
])

# Create DataFrame
three_method_table = pd.DataFrame(comparison_data, columns=[
    'Variable', 'OLS Coef.', 'OLS SE', 'MM Coef.', 'MM SE', 'Excl. Coef.', 'Excl. SE'
])

three_method_table
```

# Appendix D: Remarks on the temporal stability of the main results

![Temporal Robustness of Global South Trade Dependencies](../output/Appendix_temporal_robustness.png){#fig-temporal-robustness width=100%}

The main analysis of the paper used data for the most recent year available. 
Here we wish to provide some remarks on the stability of our key results over time. To this end, @fig-temporal-robustness gives a first visual impression of the stability of the total dependency shares we have computed, and @tbl-temporal-summary complements this with information about the mean, the standard deviation and minimum and maximum values for the . The coefficient of variation in the last column is computed as the ratio between the standard deviation and the mean, and its very low values indicate a strong temporal stability of our results.

```{python}
#| label: tbl-temporal-summary
#| tbl-cap: "Temporal Robustness Summary Statistics: Trade Dependencies (2008-2015). CV = Coefficient of Variation. Lower CV values indicate more stable relationships over time."
#| echo: false

# Read the temporal summary table
temporal_summary = pd.read_csv(here("data", "tidy", "temporal_summary_table.csv"))

# Display the table
temporal_summary
```


Finally, @tbl-temporal-stability provides further evidence for this assessment through formal trend testing.
The fact that all p-values are well beyond the conventional threshold of $0.05$ suggests that there is no underlying trend in the dynamics of the dependency shares, meaning that our results represent rather persistent features rather than temporary phenomena.


```{python}
#| label: tbl-temporal-stability
#| tbl-cap: "Temporal Stability Test Results: Trend Analysis of Trade Dependencies (2008-2015). Tests whether dependency shares show significant trends over time. Stability Assessment: 'Stable' indicates no significant temporal trend (p > 0.05)."
#| echo: false

# Read the temporal stability table
temporal_stability = pd.read_csv(here("data", "tidy", "temporal_stability_table.csv"))

# Display the table
temporal_stability
```

# References

::: {#refs}
:::
